Name | Version | Summary | date |
ts-tokenizer |
0.1.19 |
TS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed specifically for tokenizing Turkish texts. |
2025-01-30 19:59:44 |
tikara |
0.1.5 |
The metadata and text content extractor for almost every file type. |
2025-01-26 23:33:40 |
safwaText |
0.1.0 |
A Python package for Arabic text preprocessing, including cleaning, normalization, stemming, and stopword removal. |
2025-01-24 16:07:06 |
long2short |
0.1.3 |
A flexible text summarization library to summarize long documents supporting multiple LLM providers |
2025-01-23 11:13:46 |
reliq |
0.0.32 |
Python ctypes bindings for reliq |
2025-01-21 16:56:17 |
PyTokenCounter |
1.4.0 |
A Python library for tokenizing text and counting tokens using various encoding schemes. |
2025-01-13 03:30:06 |
chonkie |
0.4.1 |
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library |
2025-01-07 13:22:53 |
indoxMiner |
0.1.4 |
Indox Data Extraction |
2024-12-29 09:52:42 |
yurenizer |
0.2.2 |
A library for standardizing terms with spelling variations using a synonym dictionary. |
2024-12-08 08:03:52 |
huggingface-text-data-analyzer |
1.1.0 |
A comprehensive tool for analyzing text datasets from HuggingFace's datasets library |
2024-12-06 03:06:41 |
pawpaw |
1.0.0rc8 |
High Performance Text Processing & Segmentation Framework |
2024-11-18 04:28:31 |
analiticcl |
0.4.8 |
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation |
2024-10-17 19:47:08 |
sesdiff |
0.3.2 |
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein). This is the Python binding. |
2024-10-15 08:59:03 |
abbreviation-extractor |
0.1.4 |
A library for extracting abbreviations from text. |
2024-09-14 20:02:14 |
html-to-markdown |
1.1.0 |
Convert HTML to markdown |
2024-09-09 06:26:33 |
tokenize-text |
0.2.32 |
Tokenizing and processing text inputs with transformer models |
2024-08-11 21:53:24 |
tokenize-transformer |
0.2.14 |
Tokenizing and processing text inputs with transformer models |
2024-08-01 18:39:29 |
flashtext2 |
1.1.0 |
A package for extracting keywords from large text very quickly (much faster than regex and the original flashtext package |
2024-07-04 14:40:37 |
roter |
2024.6.25 |
Rotate and combine tables (Danish: Roter og kombiner borde). |
2024-06-24 19:22:08 |